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The relationship between a time-dependent covariate and sur- 
vival times is usually evaluated via the Cox model. Time- dependent 
covariates are generally available as longitudinal data collected reg- 
ularly during the course of the study. A frequent problem, however, 
is the occurence of missing covariate data. A recent approach to es- 
timation in the Cox model in this case jointly models survival and 
the longitudinal covariate. However, theoretical justification of this 
approach is still lacking. In this paper we prove existence and con- 
sistency of the maximum likelihood estimators in a joint model. The 
asymptotic distribution of the estimators is given along with a con- 
sistent estimator of the asymptotic variance. 

1. Introduction. The commonly used Cox [6] regression model postu- 
lates that the hazard function for the failure time T associated with a time- 
varying covariate Z takes the form 

(1.1) A(t;Z) = Ao(t)exp[/JoZ(t)], 

where /3o is an unknown regression parameter and Aq is an unspecified base- 
line hazard function. The statistical problem is that of estimating (3q and 
the cumulative baseline hazard function Ao(t) = /q Ao(s)ds on the basis of 
n possibly right-censored survival times Xi, . . . ,Xn and the corresponding 
covariates Zi, . . . , Z„, where Zi is observed on the interval [0,Xj]. 

By maximizing the partial likelihood [7], one can obtain an estimator of 
(3() that is consistent and asymptotically normal with a covariance matrix 
which can be consistently estimated [1]. Letting /? be the maximum partial 
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likelihood estimator of /3, Breslow [3, 4] suggested estimating Ao(t) by 



Aq) to a Gaussian process. 

To apply this methodology, one needs the knowledge of {Z{s) -.0 < s <t} 
for all values t < X. This is generally not available. Common problems in 
survival analysis are presence of covariate measurement error (see among 
others Dafni and Tsiatis [8], Dupuy [12], Li and Lin [19], Tsiatis and David- 
ian [29], DeGruttola, Tsiatis and Wulfsohn [31] and Wulfsohn and Tsiatis 
[34]) and occurrence of missing covariate data (see [13, 14, 15, 20, 26, 35]). 

A recent approach to estimation in the Cox model with a missing or 
mismeasured covariate consists in jointly modeling survival and the longi- 
tudinal covariate data. An extensive literature has now contributed to the 
estimation in such models (see [30] for a review and numerous references). 
However, rigorous proofs of the large-sample properties of estimators ob- 
tained from joint models remain an open problem. Note that simulations 
by Tsiatis and Davidian [29] show that the joint modeling approach should 
yield a consistent and asymptotically normal estimator for the regression 
parameter Po. Li and Lin [19] provide simulations which also seem to point 
to the asymptotic validity of this approach for estimating parameters in the 
frailty model with covariate measurement error. 

In this paper we propose a joint model for estimating parameters in the 
Cox model with missing values of a longitudinal covariate. Estimation in this 
joint model is carried out via nonparametric maximum likelihood (NPML) 
estimation. We prove consistency and asymptotic normality of the NPML 
estimator, and we give a consistent estimator for the limiting variance. 

The paper is organized as follows: in Section 2 we describe the joint model 
and derive the likelihood function. In Section 3 we investigate the theoret- 
ical properties of the model, including identifiability and the existence of 
the NPML estimator. In Section 4 we show that the NPML estimator is 
consistent and asymptotically normal and we give a consistent estimator of 
its asymptotic variance. 

2. The statistical model and construction of the joint likelihood. Sup- 
pose that n subjects are observed. For each individual, we observe survival 
and covariate data. Denote by Tj the random survival time for individual i. 

We assume that survival is subject to right censoring, that is, instead of Tj, 
we actually observe Xi = min(Tj,Cj) and a failure indicator Aj = l^Xi<Ci}j 
where Ci is a random censoring time. 

We examine the case of a single covariate Z that is measured over time 
at the instants = to < *i < ^2 < • • • • We denote Zi{tj) by Zjj-. For t > 0, 
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let at = max(A; : < t) be the index of the last observed value of Z before 
time t. 

The problem is as follows. Suppose that the data consist of i.i.d. replicates 
{Xi, Ai, Zi{-)) (i = l,...,n) of {X, A, Z{-)). For each subject i, the actually 
observed data is an incomplete random vector Yj = (Xj, Aj, Zi^, . . . , Zj^a^. )> 
where the covariate value at the time of failure Zi(Xi) is missing. The goal 
is to estimate the unknown true regression parameter (3q and the cumulative 
hazard function Aq (t) = Jq Xq (u) du {t > 0) using the incomplete vectors Yj 
(i = l,...,n). 

This work was motivated by a study whose design called for repeated 
measurements of a covariate to be made at different times io < < ^2 < ■ ■ ■ 
on patients until drop-out. The objective is to evaluate the relationship be- 
tween drop-out and the longitudinal values. The covariate being measured 
at the prespecified times tj, Cox regression using model (1.1) is compli- 
cated by missingness of the covariate values at drop-out times. Some recent 
approaches to this problem [20, 26, 35] consist in extrapolating Zi{u) at fail- 
ure time using available longitudinal data. However, for these methods to be 
valid, it is assumed that drop-out is nonignorable, that is, the probability of 
drop-out does not depend on the unobserved covariate value. This hypothe- 
sis does not hold in our setting, since hazard of drop-out at time t depends 
on the unobserved Z{t). We then propose to jointly model survival and the 
covariate in order to use full data available to estimate the parameters. Ap- 
plications of this approach in psychometry and AIDS clinical trials can be 
found in [14, 15, 22], along with comparisons with alternative methods. 

In order to derive asymptotic results for our estimators, we assume through- 
out that the following conditions C1-C7 are satisfied: 

CI. Let r be a finite time point at which any individual still under study is 
censored. Assume that P{X > r) > 0. 

C2. Conditional on the observed path of the longitudinal covariate, the haz- 
ard function for Tj is given by Xo{t) exp[(5oZ{t)]. 

C3. The covariates Zij have uniformly bounded total variation, namely, 
\ dZij{t) \ + |Zjj (0)| < c for some finite c > and all 

C4. Let / denote the joint density of {Zq, . . . , Zat, Z{t)). Suppose that / 
depends on an unknown parameter a {a £ MP), that / is continuous 
with respect to a and has continuous second-order derivatives with 
respect to a. Suppose also that / is bounded and that, for any t, 
f{zo,...,Zat,z{t);a) = /{zq, . . . , Zat, z{t);a') a.e. implies a = a'. 

C5. The parameters a and /3 are interior points of known compact sets 
A gMP and i? C M, respectively. A belongs to the set L of absolutely 
continuous [with respect to the Lebesgue measure on [0, oo)], nonde- 
creasing functions A such that A(0) = 0. Assume A(r) < oo. 
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C6. Let 6 = {a, (3, A), and note by 6o = {oq, (3o, Aq) the true value of 9. 
Let denote the parameter space A x B x L, and suppose that £ 
0. Denote by -E'eQ[-] the expectation of random variables taken un- 
der the true parameter. Suppose that -Ego [^^''^''"^^{■u<X}] is bounded 
away from on [0,r], that ^Jg^ [/(f {Z(u)}2e*^(") dAo{u)] > 0, and that 

-^do i oadaT 1" /(^o, ■■■,Z; ao)] is positive definite. 
C7. It is assumed that T and C are independent given the covariate Z. 
Moreover, we assume that the censoring distribution does not depend 
on the unobserved covariate value, or on 9. 

Condition CI is a standard assumption that supposes that some individ- 
uals are at risk at the end r of the experiment. 

Condition C2 assumes for ease of presentation that hazard of failure at 
time t depends on the time-varying covariate through its value at t. This 
could be relaxed, for example, by including a value Z{t — h) {h> 0) (such 
as in Ao(t) exp[7o.Z'(t — h) + PoZ{t)]) to study whether the variation in Z 
between {t — h) and t influences survival. We shall note, however, that, in 
this case, (3q and 70 are not identifiable if Z is a time-independent covariate. 
We refer to Chen and Little [5] for their work on Cox regression with a 
missing time-independent covariate. 

Condition C4 allows several kinds of parametric models to be used for the 
time-dependent covariate. For example, for each individual i, the ^ij 's may 
be treated as a realization of a multivariate normal random vector, whose 
mean may possibly depend on explanatory variables (times of measurements 
tj, treatment arms, covariates measured at the entry such as age, gender, 
. . . ). Various correlation structures may be assumed to take account of the 
correlation between measurements within each individual (see [10], Chap- 
ters 4 and 5). The parameter a would separate here into components for 
the mean and covariance structures. One may in this case impose additional 
conditions to ensure identifiability of covariance parameters, such as a min- 
imum number of repeated measurements on some subjects. One may also 
use transition models (see [10], Chapters 7 and 10), where the conditional 
distribution of each Zij is modeled as a function of past responses Zij-i, . . . 
and explanatory variables. Dupuy and Mesbah [14, 15] propose a joint model 
which uses a transition model for the longitudinal data. We refer to [10] for 
a detailed exposition of various parametric models for longitudinal data. 

Condition C6 will ensure invertibility of a Fisher information operator in 
the proof of asymptotic normality of the estimators in the joint model. 

Condition C7 is the usual condition of independent and noninformative 
censoring. It is usually satisfied in applications, in particular, when a subject 
is censored at r. 

The probability measure induced by the observed Y is denoted by PQ^dy) = 
fY{y",9) dy (9 G O). We shall obtain the likelihood /Y(y;^) foi^ the vector 
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of observations y = {x,6, zq, . . . , Za^) by first writing the density of (y , z) for 
some value z of Z{X), and then by integrating over z. Actually, only a par- 
tial likelihood L{9) for Y is specified, by discarding from /Y(y;^) terms 
adhering to censoring. From assumption C7, this does not influence maxi- 
mization. The resulting likelihood L{9) is 



J I Jo J 

where integration is over z and the indetermination 0*^ is set to be equal to 
1. In the following, we shall denote by l{y,z]9) the integrand of (2.2). 

3. Nonparametric maximum likelihood estimation. We shall first demon- 
strate identifiability of the proposed joint model. The proof is given in the 
Appendix. 

Proposition 1. Under conditions C1-C7, the model is identifiable, that 
is, L{9) = L{9') for almost all y implies 9 = 9', for 9, 9' £ @. 

The problem of estimating 9 is semiparametric, since the component A is 
a function. Note that the maximum in A of this likelihood function does not 
exist, so the principle of maximum likelihood is not applicable here. Nev- 
ertheless, this principle can be conveniently modified to yield a reasonable 
estimator of the function A, as well as of /? and a. 

We assume that there are no tied event times and that the number of 
events p{n) increases with the sample size n. We reorder the indices of the 
data such that Xi < • • • < Xp(„) [p{n) < n] represent the increasingly ordered 
event times and Xp(^n)+i ^ " " ■ ^ represent the nondecreasingly ordered 
censoring times. To define an estimator of A out of the likelihood (2.2), 
we proceed by the method of sieves [16], which consists in replacing the 
parameter space Q by an appropriate approximating space 0,^ called a sieve 
(we refer to Li and Lin [19], McKeague [21] and Murphy and Sen [25] among 
others, for use of sieves in various settings of survival analysis). Precisely, 
instead of the functions A = A(t),t > 0, one considers increasing stepwise 
versions A„ = A„(i),t > 0, of them with the unknown deterministic values 
An.{Xi) = An^i in the points Xi,i = 1, . . . ,p{n). The sieve 0„ is then 



{9 = (a, (3, A„) : a G /3 e M, A„,i < • • • < A„,p(„), A„,i E M, z = 1, . . . ,p{n)}. 



We shall estimate the values A„^j [i = 1, . . . ,p{n)] and the parameters (3 
and a by maximizing the likelihood (2.2) over the parameter space 0^, 
which means maximizing the pseudo likelihood 



(2.2) 




n 



(3.3) 



L„(0)=n^«(e) 
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obtained by multiplying over uncensored subjects i [i = 1,. . . ,p{n)] the in- 
dividual contribution L^'^\9), 



J AAn^i exp 



fc=i 



f{zifi,...,Zi^a^ ,z;a)dz, 



and over censored subjects i [i = p{n) + 1, . . . , n] the individual contribution 



exp 



p(n) 



f{zifl,...,Zi^a^ z]a) dz, 



fc=i 

where AA„_fc = AA„(Xfc) = A„_fc - A„^fc-i [k = 2,... ,p{n)] and AA„,i = A„,i. 
The resulting estimator is usually referred to as nonparametric maximum 
likelihood estimator (NPMLE). We will use this terminology in the following, 
keeping in mind that the only part which is really nonparametric is just 
the representation of the baseline hazard. We next demonstrate that the 
estimator in the joint model does share the useful properties of the estimators 
from parametric models. We refer to [19, 23, 24, 28, 34] for use on NPMLE 
in various situations. 

Proposition 2 shows that such an estimator exists in the proposed joint 
model. The proof is given in the Appendix. 

Proposition 2. A maximizer On = (a;„,/3n, A„) of Ln{9) over 9 e @n 
exists and is achieved. 

To maximize the logarithm of the likelihood Ln{9), we use the well-known 
approach used for the so-called expectation-maximization (EM) algorithm 
[9]. The rationale for this approach is that direct maximization of the in- 
tegrated likelihood (3.3) is difficult, and that in the present setting, the 
maximizer 0„ can be more easily characterized from an alternative EM- 
loglikelihood. 

The following proposition provides an important characterization of the 
maximizer oiJ27=i^^fyiyi'^^) [o^> equivalently, of L„(0)] on 0„. This char- 
acterization will serve for the proof of asymptotic properties. 

In the following, for any random variable X with density function fxi^', 9), 
we shall denote by Eg[g{X)] the expected value of g{X). Moreover, if X and 
Y are random variables, we shall denote by EQ[g{X)\y\ the expectation of 
g{X) taken with respect to the conditional density function fx\Y{Av'i^) 
X given Y = y. 

Proposition 3. The NPML estimator 9n satisfies the equation 



Wn{u]9n) 



cox MODEL WITH MISSING COVARIATE 



7 



where 

n n 
i=l i=l 

The proof is given in the Appendix. In this proof and in the fohowing, we 
shall use the notation 

LfS(^) = EeJ^'^l{Y,z■e)\y^], 

and refer to L ^ {6) = YJl=i Lf {6) as the EM-loglikehhood. 
4. Large sample properties. 

4.1. Consistency. Since we are interested in almost sure (a.s.) consis- 
tency, we work with fixed realizations of the data which are assumed to lie 
in a set of probability one. Let || • ||oo denote the supremum norm on [0,r] 
and II • II denote the Euclidean norm. 

Theorem 1. Under conditions C1-C7, the NPML estimator On = {an,Pi 
A„,) is consistent: ||an — ao||, |/3n — /3o| o-f^d ||A„ — Ao||oo converge a.s. to zero 
as n — > 0. 



Proof. In the following, it will be convenient to denote {a,f3) by 7. 
Our proof follows Murphy's [23] proof of a.s. consistency in the frailty 
model. The plan for proving consistency is as follows. We first show that 
the set {9n = {'yn,^n),n G N} is relatively compact. Using the proposition 
on identifiability, we then show that its closure reduces to the single element 
6'o = (7o,Ao). 

We first show that {An)neN stays bounded as n — > 00. We note from 
(A. 2) in the Appendix that A„(t) > for all t £ [0,r] and that 

A (r) < ^"=iA^l{x.<r} 

^l^i = l^{T<Xi} 

where m = miiiB^i^k e^^^^^*"^ . Noting that there exists a constant I such that 1 / 
J2i=i Mt<x,}] < > r) + / as n — > cx), it follows that A„(r) does 

not diverge to infinity. 

Let be an arbitrary subsequence of (n). From the Bolzanno-Weierstrass 
theorem, (7<^(„))nGN being a bounded sequence of M^^^ has a convergent sub- 
sequence (7(^((^(n)))neN which converges to some 7*. Since A„ is not allowed 
to diverge, the Helly-Bray lemma can be used to prove the existence of a 
subsequence {\(^^(^^(^n))))nm of (A^(0(„)))„gN which converges pointwise to 
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some A*. Since every subsequence of a convergent sequence in W'^^ must 
converge to the same limit, {%(ip[(f)[n))))neN must converge to 7*. 

Hence, for any given subsequence we can find a further subse- 

quence dn{ip{<j){n))) which converges to some 6* = (7*, A*). We now show that 
■^Ti{ip{4>{n))) converges uniformly to A*. In the following, we shall use the fol- 
lowing notation for the sake of clarity of formulas: g{n) = r]((p{(p{n))). 

We shall use in the sequel the Helly-Bray lemma and the result that 
the class of all functions / : [0, r] — > M that are uniformly bounded and of 
variation bounded is Glivenko-Cantelli [33]. 

We first define the intermediate quantity A„ by A„ (t) = Jq y^^^^^g^) , which 

will help mediate between A„ and Aq. By Glivenko-Cantelli, 

converges uniformly on [0,r] to H{u) = Eg^^[Al^x<u}]- Note that Ao(t) = 

/oT^' where W{u,9o) = Eg,[ePoZH^{u<x}]- 

The functions u\ — ^ E0g[ef^°^^'^^l^^^x}\y] are uniformly bounded and of 
variation bounded. Hence, {Wn{u, 0o))neN converges uniformly to W{u, Oq) = 
-^6*0 [e'^°^^"^l{M<x}]) which is bounded away from by condition C6. Hence, 
{l/Wn{u,6o))neN Converges to l/W{u,9o) uniformly on [0,r]. 

Applying the Helly-Bray lemma gives that both ||A„ — AqUoo and ||Ag(„) — 
A*||oo converge almost surely to 0. This establishes the relative compactness 
of {^„,nGN}. 

We now show that every subsequence (^g(n))neN must converge to the 
true value = (toi^o)- Since 0g(n) maximizes the loglikelihood, 

—- Y: [InKHO.in)) - lnLW(7o, A,(„))] > 0. 
' 1=1 

Note that, for all g{n), as m — > 00, 

1 m 

-^[lnLW(^,(„))-lnL«(7o,A,(„))] 
i=i 

— > Egg[lnL{eg(^n)) -lnL(7o,Ag(„))] a.s. 

It follows that 

(4.4) ii;,JlnL(^,(„)) -lnL(7o,A,(„))] > -o(l). 

We have 

lnL(e^g(„)) -lnL(7o,Ag(„)) — ^ InL(r) - lnL(0o) a.s. 
By Lebesgue's theorem, 

ii;,Jln(L(e;(„))/L(7o,A,(„)))] -^i?eo[ln(L(r)/L(0o))] a.s. 
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From (4.4), Eeo[\n{L{e*) / L{eQ))] > 0. This quantity cannot be treated di- 
rectly as a Kullback-Leibler distance, since (2.2) is not a likelihood in the 
traditional sense. Due to the missing observation, it may even not be viewed 
as a generalized likelihood in the sense of Jacobsen [18]. However, it can be 
shown that EgQ[ln{L{9*) / L{9q))] < 0, and, moreover, that it is equal to zero 
if and only if L{9*) = L{9q) a.e. This in turn implies 9* = 9q by Proposition 1. 
□ 

4.2. Asymptotic normality. In order to establish the asymptotic distri- 
bution of the proposed estimators 9n, we follow the function analytic ap- 
proach described by Murphy [24] to derive asymptotic theory for the frailty 
model. To calculate the score equations, instead of differentiating Lnfi{9) 
with respect to a,/3 and the jump sizes of the cumulative baseline haz- 
ard function, we consider one-dimensional submodels through the estima- 
tors and we differentiate at the estimators. That is, we set 9t = {at, Pt,-^t), 
at = a + thi, Pt = P + th2 and At{-) = /o(l + th'i{u)) dA{u), where hi is a p- 
dimensional vector, /12 £ and is a function of bounded variation defined 
on [0,r]. 

More precisely, let the class of {hi, /12, ^3) be the space H = {h = {hi, /12, 
/i3)|/ii is a p-vector, /12 £ K, /13 is a bounded function of bounded variation 
on [0,r]}. 

The following proposition gives the form of the empirical score Sn,0- Its 
proof is given in the Appendix. Define first 

d 

■In f{Zo,...,Zaj,,Z;a)\yi 



i=l 
n 



i=l 
n r 



da 
AZ- 



X 



Z{u)e^^^^UK{u)\yi 



i=l 



5ihs{xi) - En 



X 



/i3(n)e^^W dA(n)|y, 



for some value 9 of 9, and the notation S'^ ;:^A9) = iS'^ 7:^,8 ar,){9), 

n,6,12^ ' ^ 71,6,1 n,y,2'^ ^' 

SnAAQ)=n-^Y.USf^^^{Q) and hl^ = {hj,h2). 

Proposition 4. The empirical score can be written as 
SnAm) = hj2S^^e,ui(^) + S^^^,{9){hs). 

In the sequel, letting Sg(y,^)(/i) = §iLQ{9t)\t=a, we shall write the empir- 
ical and expected scores as 

1 " 

M^)W = -E^e(y-^)W' S-,{9){h)=Ee,[s-,{Y,9){h)]. 



n 



1=1 
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We define the following norm on if: if /i G H, let \\h\\H = \\hi\\ + I/12I + 
||/i3||^, where || • || is the Euclidean norm and Wh^Wy is the absolute value 
of /i3(0) plus the total variation of /13 on the interval [0,r]. We further 
define Hp = {he H, \\h\\H < p}, -ffoo = {he H, \\h\\H < 00} and BVp to be 
the space of real- valued functions on [0,r] bounded by p and of variation 
bounded by p. 

Define 9{h) = (a,/J,A)(/i) = hj^a + /i2/? + J^^ h^{u) dA{u) . Then we can 
consider the parameter as a functional on Hp, and the parameter space 
as a subset of l°°{Hp), the space of bounded real- valued functions on Hp. 
We define on l°°{Hp) the norm \\U\\p = sup/^gj;^ \U{h)\. For any finite p, the 
score function Snfi is a map from to l°^{Hp). 

We obtain the following result: 

Theorem 2. Let < p < 00. Under assumptions C1-C7, the sequence 
^JnipLn — ao,(3n — Po,J^n — Aq) weakly converges in l°"{Hp) to a centered 
Gaussian process G with covariance process 

coY[G{g),G{g*)] = ^"ff3(n)a3j^(5*)(n)dAo(n) +a2j,,(5*)52 + <TiJ,,(5*)''5i, 

where Gq^ = , cJ^g^ , 0";^^^^ ) is the inverse of the continuously invertible 
linear operator uq^^ = {o-i^Q^^,a2^eo,'^3,eo) from H^o to Hqo, defined by 

■ln/(Zo, 



f^2,0o(^) 

0-3,00 (^)(^) 



da da^ 

X 



hi. 



Z{u)e^°^^''\z{u)h2 + hi{u))dKQ{u) 
Z(7.)/i2 + /i3W)e^°^(")l{„<x} 



Proof. The proof is based on a theorem by van der Vaart and Wellner 
[33], which is stated as Lemma A.l in the Appendix. In the following lemmas, 
we verify that the conditions stated in this theorem are satisfied by our 
estimator. Some additional technical lemmas are given in the Appendix, in 
order to keep attention on the main steps of the demonstration. 

We first establish Frechet differentiability of the map 9\ — > Sqq{9) at ^o- 
Let us define the operator uq from Hoc, to H^o by a0{h) = {oifi{h),a2fi{h), 
o"3,(?(/i)), where 

92 1 
■\x\f{ZQ,...,Zax,Z;a)\y 



00 



Ea 



(4.5) a2,e{h)=Ee, 



Eg 



da da^ 

X 



hi, 







Z{u)ef^^^''\Z{u)h2 + h3{u))dA{u)\y 



a3,e{h){u) = Eg,[Ee[{Z{u)h2 + /i3(^))e^^(")l{„<x}|y]]. 
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Lemma 1. For any finite p, the following holds: there exists a continu- 
ous linear operator S0^^{9q) :lin0 — > l°°{Hp) such that ||5e)g(^) — Sq^^{9q) — 
SeM{e-eQ)\\p = op{\\e-eQ\\p) as H^-^ollp^O. The form of is 
as follows: 

Seo{^G){0){h) = - / a^fi^X^){u)dA{u) - ()a2fia{h) - oF aifi^-^{h). 
Jo 

Proof. To establish this, we use the foUowing characterization of Frechet 
differentiabihty (see [2], page 454). Let T be a function from a normed Unear 
space X to another normed hnear space Y. Let S be the set of all bounded 
subsets of X. T is Frechet differentiable at x with derivative if, for all 
SeS, 

T(x + es) -T(x) -fjes) 

— i '- ^^—^ — >0 as e — >0 uniformly in s G S". 

We first calculate the derivative DgSgf^{9o) given by 

d 

DeSeg{9o) = —SooiOo + t9)\t=o, 
where 9o + te = [oq + ta, + tf3, Ao{-) + tA{-)): 
^^SeAOo+te){h) 



AhsiX)- I /i3(u)e[*+*^]^(")(dAo(n)+t(iA(u)) 



+ /12AZ- /^/i2Z(u)e[^o+*^]^(")(dAo(^x) +tdA(n)) 
Jo 

+ -Q^^'^ f {Zq, . . . , Zax, Z-ao + ta) 

The expression for DgSg^^{6o){h) immediately follows and, using a first-order 
Taylor expansion of exp([/3o + £P]Z{u)) around ex.p{PQZ{u)), it is fairly 
straightforward to see that 

Se,{eo + e9){h) - SeM{h) - D^eSeMih) = o{e). 

Now, as e ^ 0, ll^«o(go+ee)-5,„(eo)-D.,5,o(^o)||p ^^^^^^^^^ ^o uniformly 
in 9 ranging over any element of the class of bounded subsets of lin B, where 
the notation "Zm" before a set denotes the set of all finite linear combinations 
of elements of this set. 

It follows that Sgg is Frechet differentiable at 9q and that the Frechet 
derivative Sgg{9o){9) is given by SeQ{9o){9) = DgSg^^{9o). □ 

We now consider the asymptotic distribution of the score function. 
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Lemma 2. For any finite p, the following holds: let G be a tight Gaussian 
process on l°°{Hp) with covariance 



coY{gih),g{h*)) 

Then 



{u)a3^eo {h*){u) dAo (u) + h2a2,eo (h*) + hjai^eo (^* ) • 



Proof. Note that ^/n{S^Q^{9o) - Seo{9oj){h) can be written as 



—y 



hi{u)E, [e*^(")|y,]dAo(n) 



Note that {/ii2'S'4 12(^0) - hi G \\hi\\ <p,/i2 G M, |/i2| <p} is bounded 
Donsker. The class {(5/i3(x), /13 G BVp} is Donsker (this fohows from the 
fact that the class of real- valued functions on [0,t] that are uniformly 
bounded and are of variation bounded is Donsker). The class {Jq hz{u)EQ x 

[e'^o^^^^ly] (iAo(u) : /i3 G BVp\ is a bounded Donsker class. Then the class 
{^I25e„,i2(^n) + ^h-i{x) - h3{u)Eg^ [e'3o^(-)|y] dAoiu) :heHp} is Donsker 
since the sum of bounded Donsker classes is Donsker. It follows that ^/nS^ q (On) 
converges in distribution to a zero mean tight Gaussian process Q in l°°{Hp). 

The asymptotic distribution of the score ^/n{S^ g {9o) — Sg^ {60)) is that of 
a tight Gaussian process Q in l°°{Hp) whose variance var(^(/i)) is calculated 
as 

-^^Eg,[se„{Y,eo,sm\s=o] which is- SeM{h){h). 
The covariance of G is calculated as 

cov{g{h),g{h*)) = -Ee, 



d d 



= -Se,{9o){h*){h). 

Let 9s = {as,Ps,-^s) with ag = a + ski, j3s = P + sh^ and As(-) = /o(l + 
sh^{u)) dh.{u) . Then we can calculate ■§^s,g{y,9s){h) as 



d_ 

ds 



:,t)\t=0 



-E, 
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z(u)e[^+"'^2]^W[/i*^(u)(l + shl{u)) + hl{u)] dK{u)\y 
^d'^ ln/(zo, ...,Za^,Z;a + shl) 



ds da 



Calculation of ^Sg(y, 0s)(^)|s=o is straightforward, and using notation de- 
fined above, it follows that 



cow{g{h),g{h*)) 



□ 



The approximation condition (A. 3) in Lemma A.l follows by the Donsker 
property of the class of functions {sg{y,6){h) — sgg{y,9Q){h) : \\9 — 9q\\p < 
e,h G Hp} for some e > 0. Details can be found in [11]. 

We now consider continuous invertibility of S 0^(60). 

Lemma 3. For any finite p, S0g{9Q) is continuously invertible on its 
range. 

Proof. Continuous invertibility of S'eQ(^o) on its range for some p is 
equivalent (see [2], page 418) to the fact that there exists some / > such 
that 



(4.6) 



eeiine 



To prove (4.6), we follow two steps. We first show that aof^ is a continuously 
invertible operator from Hqq to Hoo- We can achieve this by proving that 
is one-to-one and that it can be written as the sum of a continuously 
invertible operator S plus a compact operator ([32], page 424). 

Prom Lemma A. 3 in the Appendix, we know that ae^ is one-to-one, hence, 
we want to show that ag^ can be written as the sum of a continuously 
invertible operator S and a compact linear operator. We define S as 

^0 Q — ln/(Zo, . . . , Za^, Z; ao) hi, 



S(/i) 



-E, 



da da'^ 

X 



{Z(n)}2e*^(")dAo(^x) /i2, ii^^o [e*^(")l{„<x}]/i3(n) 



Prom conditions C1~C7, it follows that is a bounded linear operator 
and, hence, that E is continuously invertible. 

We now have to show that ao^^h) — S(/i) is compact. Let (/in)nGN = 
{hin,h2n,h3n)n£N be a Sequence in Hp. By the definition of a compact op- 
erator [27], we must prove that there exists a convergent subsequence of 
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Since h^n is of bounded variation, we can write /i3„ as the difference of 
bounded increasing functions h^^^ and h^^^. From Helly's theorem, there 

which converges pointwise to some 



exists a subsequence (^3^^(„)) of {h^^^] 
There also exists a subsequence {h. 



pointwise to some h^^^* . Finally, (^3^^((^(„))5 ^3^^(^(„))) converges pointwise to 

/i* = Using the same argument and the Bolzanno-Weierstrass 

theorem, we can find a subsequence of (/in)nGN [let us denote it by (/ig(n))neN 
for notational simplicity] that converges to h* = (hl,h2,hl). 

We must prove that o■0^^{hg^n)) ~ 5^(^g(n)) converges to aQf^{h*) — in 
Hp for all p. Note that crgg{h) — is equal to 

/^''z(n)e«^^(")/i3(^x)(iAo(n) , Eg,[h2Ziu)e^oZ{u)^^^^^^^y 
^(^g(n)) - CTBoih*) + is equal to 



(2) 

'37?{0(n)). 
,(2) 



of {k 



(2) • 
3</>{n). 



which converges 



Now \\(Teo{hg(^n) 



X 



which, under conditions C1-C7, is bounded above by 



ce 



be 



\{hzg{n) - K){u)\ dAoiu) + ce'"'{2 + c) ■ |/i2g(„) -hl\, 



where b is such that <b. From the dominated convergence theorem, the 
first term converges to zero and the overall bound converges to zero. It 
follows that (T0q(/i) — is a compact operator for all p. 

We have then proved that (TOq is a continuously invertible operator. This 
means that, for allp > 0, there exists aq> such that ag^{Hq) C Hp. Hence, 
the LHS of (4.6) is bounded below by 



inf 



sup^e^-i(H,)l'^eo(^o)W(^)| 



inf 

eeline 



sup 


1 


-h&Hg 


Jo 



Recall that ae^ is invertible, hence, ag^^ {ag^(h)) = h for all h = {hi, /12, ^3) G Hq. 
Since ^^^(^^'^^(/i)) = (ai,0o(CTg"^^(/i)),a2,0o(CTg'^^(/i)),a3,0o(crg;^^(/i))), it follows 
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that o'i^0f^{aQ^ (h)) = hi, i = 1,2,3. Hence, the above bound can be rewritten 



as 

supheH„\Io h3{u)dA{u) + h2f3 + hJa\ \\9\\ 
i'^-V ml 2 = ml -— f-. 

Now from Lemma A. 2 in the Appendix, \\6\\q is greater than or equal 
to q{\\a\\ V |/3| V V[o,r](A)) and \\9\\p is less than or equal to 3p(||a|| V \P\ V 
V[o,r] (■'^))) where V[o^t-](/) denotes the total variation of a function / on [0,r]. 
Hence, the RHS of (4.7) is greater than or equal to ^. It follows that Sgg{9o) 
is continuously invertible. □ 

Putting all these results together, it follows that, for all h G Hp, 

-Se,{eo)V^ien - eo)ih) = V^{s^^^S^o) - SeM){h) + op(i), 

where 

-SeMV^{en-e^){h) 

cr3,eo{h){u)y/n d{kn - Ao)(n) 

+ VniPn - (3o)cr2,eoih) + Vni^n - ao)^o-i,eo(^)- 

Consequently, y/n{9n — Oo){h) =^ —SQ^^{9Q)~^Q{h). We now want to iden- 
tify <S'6'o(^o)~"'^^- (^eo is continuously invertible, hence, for all g > there exists 
a p > such that o-Q^{g) €^ Hq if g & Hp. Let h = ag^^{g). Then 



(4.8) . 

+ VniPn - /3o)ff2 + Vnion - ao)^gi 

and 

(4.9) - SeM^^ik - Oo){h) = V^{Sn,eoiOo) - SeMKag^^ig)) + opil). 

Note that the RHS of (4.8) is {^/n{an - ao), y/n0n - Po),Vn{An - AQ)){g), 
which converges to —Sgg{9o)~^G{g). Note also that the RHS of (4.9) con- 
verges to Q {a^^ (g) ) [which has mean zero and variance Jq gs {u)a^l^ (g) (u) dAo (u) + 

^2,60^9)92 + (TiX^iaV 9i]- 

It then follows that — ^^^(So)"^^ = Gicg^)- Hence, (v^(a„ — ao), y/n0n — 
j3Q),\/n{An — Aq)) converges in l°^{Hp) to a tight Gaussian process G in 
l°^{Hp) with mean zero and covariance process 

coY[G{g),G{g*)] = r(?3(?^)f^3:eo(5*)(^^)'^Ao(n) + (T2J^(5*)(72 + <TiJ„(ff*)^5i, 
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where {a^l^ , a^J)^^ , a^l^^ ) is the inverse of ae^ = (cri^Oo , cr2,eo , crsfio ) • □ 

We now consider the problem of estimating the asymptotic variance of 
the NPML estimator. From Theorem 2, the asymptotic variance of 

(Vra(a„ - tto), y/n{Pn - Po), y/n{An - Ao))(/i) 

is/o^^3(ii)o-:^e(,(/i)('u)c?Ao('ii) + o-^0,j(/i)/i2 + o-^gjj(/i)^/ii. Using formulas (4.5), 
we propose to first estimate crgp by dg = {a-^ ^ ^^2 9 ^^3 9 ) ' '^^ere 



i=l 



da da^ 

X 



\nf{Zo,...,Zax,Z;an)\yi 



hi, 



^20W = -E^e / Z{u)e^-^(^^[h2Z{u) + h3{u)]dAn{u)\y, 



1=1 

We then propose to estimate the asymptotic variance by 

(4.10) r h3{u)a-l {h){u) dAniu) + G-\ {h)h2 + a;l {hfhi. 

Using the same arguments as in the proof of Lemma 2, we are able to show 
that the functions under J2 iii ^^2 9 ^^3 9 fo'^™ Donsker classes (for 
he Hp, p>0), hence sup^gj:^^ W^sjh) - ae^{h)\\H — >0. 

(Tg is continuously invertible, hence, for all Hp C i?oo there exists Hg C 
Hoo such that aJ^{Hq) C Hp, and for all g G Hg there exists h G Hp such 
that h = a'^^{g). 

9n 

Then 



H 



keon^eo(^)) - ^e,/('74(/i))|lH 



< sup 

heHq 



da 



H 



H heHp 



sup \W9o{h)-as{h)\\H- 



It follows that sup„gj|^ ^ (5) — CTg^ (5) ll-ff — ^ that the sequence of 
estimators (4.10) converges to the limit h3{u)a^l^^{h){u) dAQ{u) + a2l^{h)h2 + 

In the above framework, specific choices of h allow one to estimate the 
asymptotic variance of any particular estimator. For example, by setting 
h/3 = (0,1,0), one may obtain the following convergent estimator of the 
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asymptotic variance of \/n{[3n — /3o)- 



5. Discussion. In this paper we have described a joint modehng approach 
for estimation in the Cox model with missing values of a time-dependent co- 
variate. We have used nonparametric maximum likelihood estimation. Us- 
ing the theory of empirical processes and techniques developed by Murphy 
[23, 24] and van der Vaart and Wellner [33], we have shown that the pro- 
posed estimators are consistent and asymptotically normal. Moreover, we 
have proposed a consistent estimator of the asymptotic variance. 

An alternative widely used approach to the modeling of longitudinal data 
in joint models assumes a normal random effects model for the repeated 
measurements (see among others Henderson, Diggle and Dobson [17], Tsi- 
atis, DeGruttola and Wulfsohn [31] and Wulfsohn and Tsiatis [34]). Tsiatis 
and Davidian [29] estimate parameters in a joint model without requiring 
any distributional assumption on the random effects, and an informal proof 
of large-sample properties of estimators in the proportional hazards model 
is given. However, a formal theoretical justification of asymptotic properties 
for the maximum likelihood estimator in joint models with random effects 
for longitudinal data is not yet available, and should be a subject for further 
work. 

APPENDIX 



Proof of Proposition 1. In the following, a.e. will stand for almost 
everywhere. Using (2.2), \d.L{6) = lnL{6') a.e. can be reexpressed as 



5ln[ ^(x) ) +ln 



exp 



6f3z 



dA{u) 



(A.l) 



In 



exp 



6(3' z - / e^'^(") dA\u) 



f{zo, 



f{zo,...,Za,,z;a)dz 
,Za^,z;a')dz 



s'^^^") dA(n)-e'^'^(")(iA'(u)] 



a.e. 



for t < X and x G (0,r]. The LHS of (A.l) depends on the path of the lon- 
gitudinal covariate only through {z{u),t <u<x} and {zq, . . . , Za^}. Hence, 
for given {z{u) -.t <u< x} and {zq, . . . , Za^}, the RHS of (A.l) should yield 
the same value for two different paths z[u) and z*{u) {0 <u<t) taking the 
same values {zq, . . . , } at to,.. .,taf This can be expressed as 



dA{u) - e^'^(") dA'{u)] = / [e^^*(") dA{u) - e^'^*(") dA'{u)]. 
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Letting z{u) = ^ and z*{u) = ^ + h {h> 0) in [0,t] except at to,. . . ,tat [where 
z{u) and z*{u) take values zq, . . . , Za^], and eventually also at at most a 
countable number of time points in [0,t], the following holds: e^^~^^^ = 
A'(t)(l - e'^''')/A(t)(l - e^''). For a fixed h, the RHS of this expression is 
independent of ^. It follows that /? = /?' and then A = A'. Rewriting lnL(0) = 
lnL(6l') with /3 = /?' and A = A' leads to q = a'. □ 

Proof of Proposition 2. Recall that a and (3 are interior points 
of some known compact sets A and B. Suppose first that AA„^j [i = 
1, . . . ,p{n)] for some finite U. Since L„ is a continuous function of a, (3 and of 
the jump sizes AA„^j [i = 1, . . . ,p(n)] on the compact set Ax B x [0, C/]^'^"^ 
L„ achieves its maximum on this set. 

To show that a maximum of L„ exists on A x i? x [0, cxd)^^"') , we show that 
there exists a finite U such that, for all Ou = {au-, I3u-,^^n,i,u-, ■ ■ ■ ■, ^^n,p{n),u) S 
{AxBx[Q, oo)P W } \ X S X [0, t/]^^") } , there exists a = (a, /3, A A„,i , . . . , 
^K,p{n)) G ^ X S X [0, such that Ln{9) > Ln{Ou)- 

A proof by contradiction is adopted for this purpose. Assume that, for all 
U, there exists du = (au , Pu , ^K,i,u, ^K,p{n),u) G x 5 x [0, oo)p(")} \ 
{AxBx [0,?7]P(")} such that, for all 6* = (a, /3, AA„,i, . . . , AA„^p(„)) eAx 

B X [0,C/]P("), Ln{0) < Ln{Ou). Under conditions C1-C7, it is easily seen 
that the likelihood Ln{0) is bounded above by 



n 



(MAA„,(a;i))'^' exp -m J2 AA„,fel|^.^<^j /(zi,o, 
iL \ fc=i / 

where m = min^ ^l^^d^k) ^nd M = maxB,i,k e^^'-^^'^\ 

If Ou = {au,Pu,^Ki,u,-.-,^KMn),u) ^ {A X B X [0,oo)pW} \ 
{A X B X [0, then there exists j [j £ {1, . . . ,p{n)}] such that 

AAnj^u > U. Hence, there exists at least one iu {ifj £ {l,...,n}) such 
that EfeS AA„,,.,[7l{,^<,^^} > U. Hence, El=l ^Kk,ut{^,<x,^} ^ +oo 
as U — > +00. It follows that the upper bound oi Ln{0u) [and, hence, Ln{9u)] 
can be made as close to as desired by increasing U. This is the desired 
contradiction. □ 

Proof of Proposition 3. The maximizer On of Yl]=i lii/Y(yj; 6*) over 
6 £ @n satisfies 

E9^[%[i"/Y,^(Y,^;^)|yi]]|,=,~„ = o. 

This result can be obtained by using the same argument that Dempster, 
Laird and Rubin [9] used to derive the principle of the EM algorithm. Its 
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proof is therefore omitted. Discarding from Eg [In /y,z(Y, 0)|yj] terms 
adhering to censoring does not influence maximization, hence we can write 
that 

n Q 



Letting 9 G 0„, we note that L^p{6) is equal to 

pin) 

fc=i 

+ [In AA„j]^ + lnfiZo,...,Za^,Z;a)\yj 



Summing this expression over j (j = 1, . . . , n), deriving with respect to AA„,^j 

and solving the derivative to gives AA„^j = l/[I]"=i [e^"^^^'^'^{Xi<x}\yj]]- 

The cumulative baseline hazard can be estimated by A„(t) = J2'i=i ^^n,i'^{Xi<t}- 
Using Hn and Wn given in Proposition 3, this can further be written as 

□ 



(A.2) 



K{t) 



Wniu^On) 



Proof of Proposition 4. Letting 9 be some value of 9 and 9t 
{at,Pt,M), Lg{9t) is equal to 

LM) = (51n[(l + thsix)) dA{x)] - E, 



Then 



+ E,[A[p + th2]Z\y] + ^.[ln/(Zo, . . . , Z,^ , Z; a + thi)\y]. 



Ee 



X 



+ h2Eg 



AZ 



X 



— In f{Zo, Zax ,Z;a + thi)\y 



Letting t = in this derivative and using notation previously defined for 
•^n 9 "^n 2(^) "^nSS^^)' Pi^oposition 4 immediatly follows. 

We verify that S^g {9n) = 0. To see this, recall (proof of Proposition 3) 
that the maximizer 9n of J27=i^''^ fy{yi',(^) over G„ satisfies 

1 " (? ■ 1 " 

-f^^fi^)\e=L = 0' equivalents, -Y.'^eS^uOn){h) = 0. 



89 

1=1 



i=l 
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Hence, = 0. 

Note also that S0q{9o) = 0. To see this, recah that, in Proposition 1 
it was shown that the model is identifiable, that is, the function t > — > 
£'eg[ln/Y(Y; ^0,*)] has a unique maximum at t = [where 0o,i = {cto,t, Po,t, ■^o,t)]- 

Since Seo[| in /Y(Y;0o,t)k=o] =0 and 

it follows that £'6iJ^[£'6iJln/Y,z(Y, Z; 6'o,t)|y]]|t=o] = 0. By discarding terms 
adhering to censoring from Eg^^[liifY,zC^,Z;9Q^t)\y]i it follows that 
Eeo[iLe,ieo,t)\t=o] =0. Hence SeM{h) = 0. □ 

In the following lemma, we recall Theorem 3.3.1 of [33], which will serve 
as a basis for our proof of asymptotic normality. 

Lemma A.l (Theorem 3.3.1 of [33]). Let Sn and S be random maps and 
a fixed map, respectively, from ^ into a Banach space such that 

(A.3) V^iSn - S)ii,n) - MSn " 5)(V'o) = Op{l + V^Un " V'oll), 

and such that the sequence y/n{Sn — S){ipo) converges in distribution to a 
tight random element Z. Let ip i — > £'('(/') be Frechet-differentiable at ipQ with 
a continuously invertible derivative S{ijjo). If S{ipo) = and ijjn satisfies 
Snii^n) = op{n~^/'^) and Tpn — i^o = op{l) , then 

The following two technical lemmas will be useful for our proof. 

Lemma A. 2. For any finite p, the following holds: if £ l°°{Hp), then 
p{\\a\\ V \P\ V V^o,r](A)) < ll^llp < 3p(||a|| V |/3| V V-[o,.](A)). 

This lemma can easily be proved and its proof is therefore omitted. See 
[11] for details. 

Lemma A. 3. The operator ag^^ is such that Ker(o"5iQ) = {0}. 

Proof. Suppose that crg^^{h) = for some h= {hi, /i2, ^s)- Then o"i^e),j(/i) = 

0. It follows that hiaifi^{h) =0. Since -Eg^^l g^Q^r ln/(Zo, . . . ,Z;ao)] is 
positive definite (condition C6), it follows that hi = 0. 
If we assume crQ^{h) = 0, then 

/ h-i {u)a3 g (h) (n) dAo {u) + /i2(T2,6io (h) + hjai g {h) = 0, 
Jo 
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which we can rewrite as Egi^[{s0^^(Y ,9o){h))'^] = 0. From this, it follows that 
sgg(y,6o)(h) = a.e. Using the fact that hi = 0, and acting similarly as in 
proof of identifiability, we can show that /i2 = 0. 

At this stage, we get that hi = and /12 = 0. Let h = (0,0, /13). Then 

a3,eo{h){u) = /i3(n)^9o[e*^(")l{„<x}] = for all u. 

It follows from condition C6 that hs{u) = for all u. We conclude that ag^ 
is one-to-one. □ 
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